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Abstract. Multiple time scale stochastic dynamical systems are ubiquitous in science and engineering, and 
the reduction of such systems and their models to only their slow components is often essential for 
scientific computation and further analysis. Rather than being available in the form of an explicit 
analytical model, often such systems can only be observed as a data set which exhibits dynamics 
on several time scales. We will focus on applying and adapting data mining and manifold learning 
techniques to detect the slow components in such multiscale data. Traditional data mining methods 
are based on metrics (and thus, geometries) which are not informed of the multiscale nature of the 
underlying system dynamics; such methods cannot successfully recover the slow variables. Here, we 
present an approach which utilizes both the local geometry and the local dynamics within the data 
set through a metric which is both insensitive to the fast variables and more general than simple 
statistical averaging. Our analysis of the approach provides conditions for successfully recovering 
the underlying slow variables, as well as an empirical protocol guiding the selection of the method 
parameters. 
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1. Introduction. Dynamical systems of engineering interest often contain several dis¬ 
parate time scales. When the evolving variables are strongly coupled, resolving the dynamics 
at all relevant scales can be computationally challenging and pose problems for analysis. Of¬ 
ten, the goal is to write a reduced system of equations which accurately captures the dynamics 
on the slow time scales. These reduced models can greatly accelerate simulation efforts, and 
are more appropriate for integration into larger modeling frameworks. 

Following the methods of Mori [28], Zwanzig [44], and others [5, 8, 20], one can reduce the 
number of variables needed to describe a system of differential equations. However, in general, 
this reduction introduces memory terms. It transforms a system of differential equations into 
a system of (lower-dimensional) integro -differential equations, so that the reduction of the 
number of variables is counterpoised by the increased complexity of the reduced model. Here, 
we will study the special case of evolution equations which contain an inherent time scale 
separation; in this case, it is possible, in principle, to obtain a reduced system of differential 
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equations in only the slow variables without memory terms. Such an analysis crucially hinges 
on knowing in which variables (or, more generally, functions of variables) one can write such 
a reduced system of slow evolution equations. 

Moving averages and subsampling have often been used in simple cases as appropriate 
functions of variables in which to formulate slow lower-dimensional models [29]. However, 
if the underlying dynamics are sufficiently nonlinear, such statistics may fail to capture the 
relevant structures and time scales within the data (see Figure 1 for a schematic illustration). 
For well-studied systems, one often has some a priori knowledge of the appropriate observables 
(such as phase held variables) with which to formulate the reduced dynamics [7, 42], However, 
such observables may not be immediately obvious upon inspection for new complex systems, 
and so we require an automated approach to construct such slow variables. 

Given an explicit system of ordinary differential equations, one can make numerical ap¬ 
proximations, such as the quasi-steady state approximation [32] or the partial equilibrium 
approximation [17], to reduce the system dimensionality without introducing memory terms. 
There has been some recent analytical work on extending and generalizing such ideas to more 
complex systems of equations [1, 6, 11, 12, 19, 29, 36]. However, in many instances, closed 
form, analytical models are not given explicitly, but can only be inferred from simulation 
and/or experimental data. We therefore turn to data-driven techniques to analyze such sys¬ 
tems and uncover the relevant dynamical modes. In particular, we will use a manifold-learning 
based approach, as such methods can accommodate nonlinear structures in high-dimensional 
data. 

The core of most manifold learning methods is having a notion of similarity between data 
points, usually through a distance metric [2, 9, 10, 30, 40]. The distances are then integrated 
into a global parametrization of the data, typically through the solution of an eigenprob- 
lem. In this paper, we will analyze multiple time scale stochastic dynamical systems using 
data-driven methods. Standard “off-the-shelf” manifold learning techniques which utilize the 
Euclidean distance are not appropriate for analyzing data from such multiscale systems, since 
this metric does not account for the disparate time scales. Research efforts have addressed the 
construction of more informative distance metrics, which are less sensitive to noise and can 
better recover the true underlying structure in the data by suppressing unimportant sources 
of variability [4, 18, 31, 33, 43]. The Mahalanobis distance is one such metric. It was shown 
that the Mahalanobis distance can remove the effect of observing the underlying system vari¬ 
ables through a complex, nonlinear function [13, 34, 37]. Here, we will show the analogy 
between removing the effects of such nonlinear observation functions (in the context of data 
analysis), and reducing a dynamical system to remove the effects of the fast variables. Our 
approach will build a parametrization of the data which is consistent with the underlying 
slow variables. Because our approach is data-driven, we require no explicit description of the 
model, and can extract the underlying slow variables from either simulation or experimental 
data. Furthermore, the approach implicitly identifies the slow variables within the data and 
does not require any a priori knowledge of the fast or slow variability sources. Even when 
the underlying dynamical system is complex with nonlinear coupling between the fast and 
slow variables, we will show that our approach has the potential to isolate the underlying slow 
modes. 

We will present detailed analysis for our method, and provide conditions under which it will 
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(a) 


(b) 


Figure 1: (a) Schematic of a two-dimensional, two-time scale {t\ = 50 and T 2 = 2) ordinary 
differential equation system where the value of X 2 becomes slaved to the value of x\. In such 
an example, traditional data mining algorithms are sufficient to recover the slow variable, (b) 
Schematic of a two-scale two-dimensional stochastic dynamical system where the statistics 
of X 2 become slaved to x\. In such an example, traditional data mining algorithms will not 
recover the slow variable if the variance in the fast variable is too large. 


successfully recover the slow variables. Furthermore, based on this analysis, we will present 
data-driven protocols to tune the parameters of the method appropriately. Our presentation 
and discussion will address two-time-scale stochastic systems; however, we claim that our 
framework and analysis readily extends to systems with multiple time scale separations. 

2. Multiscale Stochastic Systems. Consider the following two-time-scale system of stochas¬ 
tic differential equations (SDEs), 

dxi{t ) = aj(x(t))di + dWi(t), 1 < i < m 

^ ^ dxi(t ) = ———^-^-dt -\ — —dWi(t), m + 1 < i < n 

e v e 

where W{(t) are independent standard Brownian motions, x(f) = [x±(t) ■ ■ ■ x n (t)] T € M n , 

and t<l. In the simple case of a linear drift function, i.e., when a,(x(f)) = mxi with /ij < 0, 
the probability density function of x t approaches a Gaussian with (finite) variance . The time 
constant of the approach of the variance to equilibrium is — 1/Hi for i = 1 ,,m and — e///j for 
i = m+ 1,..., n [25]. Thus, the last n—m variables of (2.1) rapidly approach a local equilibrium 
measure and exhibit fast dynamics, while the first m variables exhibit slow dynamics. A short 
burst of simulation will yield a cloud of points which is broadly distributed in the fast directions 
but narrowly distributed in the slow ones. With the appropriate conditions on cq(x), the same 
can be said for more general drift functions, where ^ are the eigenvalues of the Jacobian of 
a(x) = [ai(x) ••• a„(x)] [41]. Therefore, (2.1) defines an n-dimensional stochastic system 
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with m slow variables and n — m fast variables, and e defines the time scale separation. The 
ratio of the powers of e in the drift and diffusion terms in (2.1) is essential, as we require the 
square of the diffusivity to be of the same order as the drift as e —>• 0 [3]. If the diffusivity 
is larger, then, as e —» 0, the equilibrium measure will be unbounded. Conversely, if the 
diffusivity is smaller, the equilibrium measure will go to 0 as e —>• 0. 

Assuming the sample average of a*(x) converges to a distribution which is only a function 
of the slow variables, then by the averaging principle [16], we can write a reduced SDE in 
only the slow variables aq, ,x m . The aim of our work is to show how we can detect such 
slow variables automatically from data, in order to help inform modeling efforts and aid in 
the writing of such reduced stochastic models. In general, we are not given the variables 
x(t) from the original SDE system, but instead, we are given some observations in the form 
y(t) = f (x(t)). We assume that f : M n H > M rf , n < d, is a deterministic (possibly nonlinear) 
function whose image is an n-dimensional manifold Ai in M d . For our analysis, we require 
g = f _1 to be well-defined on Ai, and both f and g to be continuously differentiable to fourth 
order. Given data y(t ±),..., y(Uv) on Ai we would like to recover a parametrization of the 
data that is one-to-one with the slow variables aq,..., x m . 

3. Local Invariant Metrics. In order to recover the slow variables from data, we will utilize 
a local metric that collapses the fast directions. Typically, such a metric averages out the fast 
variables. However, simple averages are inadequate to describe data which is observed through 
a complicated nonlinear function. Instead, we propose to use the Mahalanobis distance, which 
measures distances normalized by the respective variances in each local principal direction. 
Using this metric, we still retain information about both the fast and slow directions and can 
more clearly observe complex dynamic behavior within the data set. 

If two points x(U) and x(i 2 ) are drawn from an n-dimensional Gaussian distribution with 
covariance C x , the Mahalanobis distance between the points is defined as [26] 

(3.1) ||x(ti) - x(t 2 ) \\m = yj (x(U) - x(t 2 )) r Cx 1 (x(U) - x(t 2 )). 

In particular, for (2.1), whose covariance does not depend on x, C” 1 = diag(ei,..., e n ) is a 
constant matrix where 


(3.2) 


ei =1, 1 < i < m 

ei =e, m + 1 < i < n, 


and the Mahalanobis distance between samples is 


n 

(3.3) ||x(f 2 ) - x(U)||| f = ei ( Xi(t 2 ) - Xi(ti)) 2 . 

i= 1 

Note that in (3.3), the fast variables are collapsed and become 0(y/e) small, and so this metric 
is implicitly insensitive to variations in the fast variables. The metric (3.3) can be rewritten 
as 


(3.4) 


x(t 2 ) -x(U) f M = 


z(* 2 ) -z(ti)||! 
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where 

( 3 . 5 ) Zi(t) = yjelxi(t ). 

z (t) is a stochastic process of the same dimension as x(f), rescaled so that each variable has 
unit diffusivity. This rescaling transforms our problem from one of detecting the slow variables 
within dynamic data to one of traditional data mining. The Mahalanobis distance incorporates 
information about the dynamics and relevant time scales, so that using traditional data mining 
techniques with this metric will allow us to detect the slow variables in our data [ 35 ]. It is 
important to note that, in practice, we never construct z(t) explicitly. It was shown in [ 34 ] 
that, assuming f is bilipschitz, the Mahalanobis distance can be extended to approximate 
(to fourth order) the Euclidean distance between the rescaled samples z it) from accessible 
y(f) = f(x(£)), 

(3-6) \\y{t 2 ) - y(*i)||Af = ||z(i 2 ) - z(fi)||| + 0(\\y(t 2 ) - y(H)||f). 

This approximation is accurate when ||y(t 2 ) — y(£i)|| is small. Because we will integrate these 
distances into a manifold learning algorithm which only considers local distances, we can 
recover a parametrization of the data which is consistent with the underlying system variables 
x(t), even when the data are obscured by a function f. In Section 5 , we will show how we can 
approximate this distance directly from data y (t). 

4. Diffusion Maps for Global Parametrization. From pairwise distances, we want to 
extract a global parametrization of the data that represents the slow variables. We will 
use diffusion maps [9, 10], a kernel-based manifold learning technique, to extract a global 
parametrization using the local distances that we described in the previous section. Given 
data y(ii),... , y(ijv)j we first construct the kernel matrix W £ M. NxN , where 

lly(**) -y(t?)ll 2 

2 

®kernel 

Here, || • || denotes the appropriate norm (in our case, the Mahalanobis distance), and cr kernel 
is the kernel scale and denotes a characteristic distance within the data set. Note that crkemei 
induces a notion of locality: if ||y(£j) — y(tj)\\ 2> okernel, then Wij is negligible. Therefore, we 
only need our metric to be informative within a ball of radius a kernel- We then construct the 
diagonal matrix D € M. NxN , with 



(4.1) 


Wij = exp 


N 

( 4 . 2 ) Du = '£ Wij. 

3 = 1 

We compute the eigenvalues Ao,..., Ajv-i and eigenvectors (f>o,, 4 >n-i of the matrix A = 

T 

D _ 1 W, and order them such that 1 = Ao > |Ai| > • ■ ■ > |Ajv— 11- 4>o = [l 1 • ■ ■ l] is the 

trivial eigenvector; the next few eigenvectors provide embedding coordinates for the data, so 
that 4>j(i), the i th entry of 4>j , provides the j th embedding coordinate for y(U) (modulo higher 
harmonics which characterize the same direction in the data [15]). 
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Setting 6t in the Observation Domain 

jrf\ J 

# 

6t is sufficiently small 

Ellipse approximates 
local cloud of data 

• 

{Ak J 

♦ Approximating 

ellipse is inaccurate 
due to nonlinearities 


Setting <r kernel in the Transformed Domain 

® kernel • \ 

v *. 

^kernel *\ 

>• \ 

/ \ 

: \ 

- / \ . 

V 

— V 

^kernel ' s sufficiently small 

^kernel is to ° lar S e 

Curvature is small 

Curvature is large 

within this radius 

within this radius 


Figure 2: Illustration of how to choose St and (J kernel appropriately. Curvature effects and 
other nonlinearities should be negligible within a time window St and within a ball of radius 

Gkernel • 


5. Estimation of the Mahalanobis Distance. As previously mentioned, we do not have 
access to the original variables x(t) from the underlying original SDE system. Instead, we only 
have measurements y(t) = f(x(i)), and we want to estimate the Mahalanobis distance between 
the x variables from observations y(t). The traditional Mahalanobis distance is defined for a 
fixed distribution, whereas here we are dealing with a distribution that possibly changes as a 
function of position due to nonlinearities in the observation function f and in the drift a(x). 
Consequently, we use the following modified definition for the Mahalanobis distance between 
two points, 

(5.1) \\y(t 2 ) - y(ii)||M = ^(y(*2) - y(ti)) T (c t (y(ti)) + C t (y(i 2 ))) (y(t 2 ) - y(*i)), 

where C(y(t)) is the covariance of the observed stochastic process at the point y (f), and f 
denotes the Moore-Penrose pseudoinverse (since d may exceed n). 
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To motivate this definition of the Mahalanobis distance, we first consider the simple linear 
case where f(x) = Ax, with A £ M rfxn . The covariance of the observed stochastic process 
f(x) is given by C = AC^A 1 . Let A = UAV r be the singular value decomposition (SVD) of 
A, where U € M dxn , A € M nxn , and V € M nxn . The pseudoinverse of the covariance matrix 
is Cfi = UA~ 1 V t C“ 1 VA _1 U t . Consequently, the Mahalanobis distance (5.1) is reduced to 
(5.2) 

llyte) - y(ti)||M = (yte) - y(ii)) r c t ( y (t 2 ) - y(*i)) 

= (x(t 2 ) - x(ti)) T A T C~ 1 A(x(t 2 ) - x(ti)) 

= (x(t 2 ) - x(fi)) T VAU T UA _1 V T C“ 1 VA~ 1 U T UAV T (x(t 2 ) - x(ti)) 

= (x(t 2 ) - x(t 1 )) T C~ 1 (x(t 2 ) - x(h)) 

= ||x(t 2 ) - x(ti)\\ 2 M = II z (^) - z(ti)||l- 

Hence evaluating the Mahalanobis distances of the observations y(t) = f(x(t)) using (5.1) 
allows us to estimate the Euclidean distances of the rescaled variables z (in which the fast 
coordinates are collapsed). 

Following [34], we will show via Taylor expansion that the Mahalanobis distance between 
the observations (5.1) approximates the Euclidean distance in the rescaled variables for general 
nonlinear observation functions f (provided f is bilipschitz and both f and f" 1 are differen¬ 
tiable to fourth order). (5.1) cannot be evaluated directly since we do not have access to the 
covariance matrices, so we will instead estimate the covariances directly from data. We can 
estimate the covariance C(y(io)) empirically from a set of values y(ti), •.., y (t q ) drawn from 
the local distribution at y(to). One way to obtain such a set of points is to run q simulations 
for a short time, St, each starting from y(to). Alternatively, we can consider a single time 
series of length q5t starting from y(to), and then estimate the covariance from the increments 
Ay (ti) = y(ti) — y(tj_i). Although we will present analysis and results for the first type of 
estimation, the second case is often more practical in practice. 

Errors in our estimation of the Mahalanobis distance arise from three sources. One source 
of error is approximating the function f locally as a linear function by truncating the Taylor 
expansion of f at first order. An additional source of error arises from disregarding the drift 
in the stochastic process, and assuming that samples are drawn from a Gaussian distribution. 
The third source comes from finite sampling effects. In this work, we will address and discuss 
the first two sources of error (the finite sampling effects are the subject of future research). 
We can control the effects of the errors due to truncation of the Taylor expansion by adjusting 
& kernel, the higher-order terms in this expansion will be small for points which are close, 
such that adjusting cr kernel will allow us to only consider distances which are sufficiently 
accurate in our overall computation scheme. Furthermore, we can control the errors incurred 
by disregarding the drift by adjusting the time scale of our simulation bursts 5t. Figure 2 
illustrates some of the issues in choosing the sizes St (or qSt if the alternate method is used) 
and the parameter a kernel- We will present both analytical results for the error bounds, as well 
as an empirical methodology to set the parameters (Jkernei and St for our method to accurately 
recover the slow variable(s). 

5.1. Error due to the observation function f. We want to relate the distance in the 
rescaled space, ||z(t 2 ) — z(ti)|| 2 , to the estimated Mahalanobis distance between the observa- 
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tions ||y(t 2 ) — y(ti)||M- We define the error incurred by using the Mahalanobis distance to 
approximate the true distance as 

(5-3) E M (y(ti),y(t 2 )) = ||z(t 2 ) - z(ti)||i - ||y(i 2 )-y(^i)||M- 


By Taylor expansion of g (y) = f 1 (y) around y(ti) and y (t 2 ) and averaging the two expan¬ 
sions, we obtain 
(5.4) 

^ n d 

Em (y{h),y{t 2 )) =- EE {9i,{j)(y(ti))9i,(k,i)(y(.h)) - 9i,{j)(y(t2))gi,(k,i)(y(t2))) x 

i= 1 jkl= 1 

{yj(h) - VjihMvkih) ~ yk{ti))(yi{t 2 ) - yi(h)) 

^ n d 

+(E E {9i,{j,k)(y(.h))9i,(i : m){y(ti)) + 9i,(j,k)(y(t2))9i,(i,m)(yfa))) x 

i= 1 jklm= 1 

(%-(* 2 ) - yj(h))(y k {t 2 ) - y k (ti)){yi{t 2 ) - yi(t{))(y m (t 2 ) - y m (t i)) 

^ n d 

+jE E (5j,o)(y(^i))5i,(fe,i,m)(y(ii)) + 3j,oo(y(*2))9i,(fc,j,m)(y(*2))) * 

z=l jklm= 1 

(yj(* 2 ) - yj{h))(yk{t 2 ) - yk{h)){yi(t 2 ) - yi(h))(y m {t 2 ) - y m (ii)) 
+0 (||y(t 2 ) — y(ti)|||), 


% 

<% 
d 2 9i 
dyjdy k 
d 3 gi 

dyjdy k dyi 

In [34] , it was shown that the error incurred by using the Mahalanobis distance to approximate 
the L 2 -distance between points z (t) is 0(||yi — y 2 || 2 ) (see the Supplementary Materials for 
details). We now see from (5.4) that the error is an explicit function of the second- and higher- 
order derivatives of g = f _1 and the distance between samples ||y(i 2 ) — y(fi)|| 2 . We would 
like to note that this error does not depend on the dynamics of the underlying stochastic 
process (as we assume the covariances at each point on the manifold are known), but is 
only a function of the measurement function f. The parameter a kerne i in the diffusion maps 
calculation determines how much Em contributes to the overall analysis. From (4.1), distances 
which are much greater than cr kerne i are negligible in the diffusion maps computation because of 
the exponential kernel. Therefore, we want to choose p\ erne i on the order of ||y(t 2 ) — y(ti)\\ 2 M 
in a regime where \EM(y(ti),y(t 2 ))\ <C ||y(f 2 ) — y(ii)||^p This is illustrated in Figure 2, 
where we want to choose cr kerne i small enough so that the curvature and other nonlinear 
effects (captured in the error term Em) are negligible. This will ensure that the errors in the 
Mahalanobis distance approximation do not greatly effect our overall analysis. 


where 


(5.5) 


9i,{j) V e i 
9i,(j,k) yf&L 
9i,(j,k,l) yf&L 
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On first inspection, it would appear that our analysis indicates that akernel should be 
chosen arbitrarily small. However, to obtain a meaningful parametrization of the data set, 
there must be a nonnegligible number of data points within a ball of radius (Jkemel around 
each sample. Therefore, the sampling density on the underlying manifold provides a lower 
bound for a kernel¬ 
's.2. Error due to the dynamics. To compute the Mahalanobis distance in (5.1), we 
require C, the covariance of the observed stochastic process y (t) = f(x(i)). We will use 
simulation bursts to locally explore the dynamics on the manifold of observations in order 
to estimate the covariance at a point y (t) from data [39, 38]. We write the elements of the 
estimated covariance C(y (t),5t) as 

(5.6) 

Cij(y(t),5t ) = ^(E [yi(t + 6t)yj(t + 8t) | y(f)] -E [yi(t + 5t) | y(i)] E [yj{t + St) | y (t)}), 

where 5t > 0 is the length of the simulation burst. 

Due to the drift in the stochastic process and the (perhaps nonlinear) measurement func¬ 
tion f, we incur some error by approximating the covariance at a point y(f) using simulations 
of length 5t > 0. Define the error in this approximation as 

(5.7) E c (y (t),8t) = C(y (t),5t) - C(y (t)). 


By Ito-Taylor expansion of f and x(t) [24], 

(5.8) 

j n ^ r rt+St / rt+St rs 2 \ 

E c ,ij(x(t),8t) =— 5^/i )(fc) (x(i))E J (J 4(fc,o)(x(si))dsi + ^ f j ^ k) (x(s 1 ))ds 1 \ds 2 


k =1 


’S 2 

^ "• r rt+St / rt+St rS 2 \ 

+ ^Xj4(fe)( x W) E J [J A(fe, 0 )( x (si))^si + y /i,( 0 ,fc)( x (si))dsij ds 2 

1 - n . f / r s 2 

+ 77^ E / (/ /i,(fc,o( X ( S l)) dW »i.fc 

c fc,z=i v-'t 

+ 0(<5f 3/2 ) 


rs 2 


fj,(k,i)(x-(si))dW Sltk ds 2 


where 

, = 1 % 
f 1 

, 1 ( d ( ai(x) dfi\ 1 <9 3 /j \ 

= + ^dxkdx^J 

, = _j_ y- ( «z( x ) d 2 /) 1 d 3 fi \ 

V e i dxkdxi 2 eidx k d 2 xi)' 


(5.9) 
















10 


C. J. DSILVA ET AL 



Figure 3: Data, simulated from (6.1) with a = 3 and e = 10 3 , for 3000 time steps with 
dt = 10 -4 . The data are colored by time. 


From (5.8), the error in the covariance is O(St) (as the ds integrals are each 0(5t) and the 
dW integrals are each 0(Vdt)) and a function of the derivatives of the observation function f 
and the drift a. We want to set dt such that ||Ec|| <C ||C|| (this is illustrated in Figure 2), so 
that the estimated covariances are accurate. Note that in practice, we compute C by running 
many simulations of length 5t starting from x(t), and use the sample average to approximate 
the expected values in (5.6). We therefore incur additional error due to finite sampling; this 
error is ignored for the purposes of this analysis, and quantifying this error is the subject of 
future research. 

Our analysis reveals that the errors decrease with decreasing <5t; at first inspection, one 
would want to set 5t arbitrarily small to obtain the highest accuracy possible. However, often 
in practice, one cannot obtain an arbitrarily refined sampling rate, such that a smaller 5t 
results in fewer samples with which to approximate the local covariance.When also account¬ 
ing for these finite sampling errors, and one should take dt as long as possible while still 
maintaining negligable errors from the observation function f and the drift a. 

6. Illustrative Examples. For illustrative purposes, we consider the following two-dimensional 
SDE 


dx\(t) = adt+ dW\{t) 

dx 2 (t) = dt+ -^dW 2 {t) 

e V e 

where a is an 0(1) constant, as a specific example of (2.1). x\ is the slow variable, and x 2 is 
a fast noise whose equilibrium measure is bounded and O(l). Figure 3 shows data simulated 
from this SDE colored by time. We would like to recover a parametrization of this data which 
is one-to-one with the slow variable x±. 
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Figure 4: Comparison of using the Euclidean distance and the Mahalanobis distance in mul¬ 
tiscale data mining, (a) The data from Figure 3, colored by the first diffusion map coordinate 
when using the Euclidean distance in the kernel in (4.1). Note that we do not recover the 
slow variable, (b) The data from Figure 3, colored by the first diffusion map coordinate when 
using the Mahalanobis distance in the kernel in (4.1). The good correspondence between this 
coordinate and the slow variable is visually obvious. 


6.1. Linear function. In the first example, our observation function f will be the identity 
function, 


( 6 . 2 ) 


yi(t) 

mi*) 


f (x(t)) = 


g(y (*))= f _1 (y (*)) = 


Xl(t) 

X2 [t) 

yii*) 
.2/2 it) 


where the fast and slow variables remain uncoupled. In this case, there is no error incurred 
due to the measurement function f (Em = 0), as the second- and higher-order derivatives of 
g are identically 0. 

6.1.1. Importance of using the Mahalanobis distance. We want to demonstrate the 
utility of using the Mahalanobis distance compared to the typical Euclidean distance. We 
compute the diffusion map embedding for the data in Figure 3, using both the standard 
Euclidean distance and the Mahalanobis distance for the computation of the kernel in (4.1). 
The data, colored by <f>\ using the two different metrics, are shown in Figure 4. When using 
the standard Euclidean distance which does not account for the underlying dynamics, the first 
diffusion maps recovers the fast variable X 2 , suggesting the fast modes is the dominant scale 
purely in terms of data analysis (Figure 4(a)). In contrast, the slow variable is recovered when 
using the Mahalanobis distance, as the coloring in Figure 4(b) (where the data are colored by 
the first diffusion maps variable) is consistent with the coloring in Figure 3 (where the data 
are colored by time). 
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Figure 5: Errors in covariance estimation for linear example, (a) The analytical contributions 
to the covariance for the example in Section 6.1 as a function of 5t. (b) The average estimated 
covariance ||C|| as a function of 5t. The average is computed over 10 data points and using 50 
sample points to estimate each covariance. The shaded yellow region indicates the range of 
5t over which the errors in the estimated covariance are less than the norm of the covariance. 


6.1.2. Errors in covariance estimation. For the example in (6.2), the analytical covari¬ 
ance is 


(6.3) C(x(t)) = 

From (5.8), we find 


(6.4) 


E c (x(f),5f) 


0 

0 



+ 0(6t 3/2 ). 


Therefore, ||C|| = O Q) and ||Ec|| = O (|j) (provided VSt', this will be discussed 

further in Section 6.1.3). These terms are shown in Figure 5(a) as a function of 5t. We want 
to choose 5t in a regime where ||Ec|| -C ||C|| (the yellow shaded region in Figure 5 indicates 
where ||Ec|| < ||C||), so that the errors in the estimated covariance are small with respect to 
the covariance. 

When we do not analytically know the functions f or g, we can find such a regime em¬ 
pirically by estimating the covariance for several values of St. This provides an estimate of 
C = C + Ec as a function of 5t. From Figure 5(a), we expect a “knee” in the plot of ||C|| 
versus 5t when ||Ec|| becomes larger than ||C||. Figure 5(b) shows the empirical ||C|| as a 
function of 5t for the data in Figure 3, and the knee in this curve is consistent with the 
intersection in Figure 5(a). 

6.1.3. Recovery of the fast variable. Note that, for the example in (6.2), Ec is a constant 
diagonal matrix. Therefore, taking 5t too large will not lead to nonlinear effects or mixing of 
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the fast and slow variables. Rather, changing 5t will only affect the perceived ratio of the fast 
and slow timescales. 

To see this behavior in our diffusion maps results, we must first discuss the interpretation 
of the diffusion maps eigenspectrum. The diffusion maps eigenvectors provide embedding co¬ 
ordinates for the data, and the corresponding eigenvalues provide a measure of the importance 
of each coordinate. However, some eigenvectors can be harmonics of previous eigenvectors; 
for example, for a data set parameterized by a variable x, both cosx and cos 2a; will appear 
as diffusion maps eigenvectors (see [15] for a more detailed discussion). These harmonics 
do not capture any new direction within the data set, but do appear as additional eigen¬ 
vector/eigenvalue pairs. Therefore, for the two-dimensional data considered here, the fast 
variable will not necessarily appear as the second (non-trivial) eigenvector. As the time scale 
separation increases, the relative importance of the slow and fast directions will also increase. 
This implies that the eigenvalue corresponding to the eigenvector which parameterizes the fast 
direction will decrease, and the number of harmonics of the slow mode which appear before 
the fast mode will increase. 

Figure 6 shows results for three different values of 5t (the corresponding values are in¬ 
dicated by the dashed lines in Figure 5). When the time scale of the simulation burst used 
to estimate the local covariance (indicated by the red clouds in the top row of figures), is 
sufficiently shorter than that of the equilibration time of the fast variable, the estimated lo¬ 
cal covariance is accurate and the fast variable is collapsed significantly relative to the slow 
variable. This means that the fast variable is recovered very far down in the diffusion maps 
eigenvectors. The left two columns of Figure 6 show that, for this example, when the sim¬ 
ulation burst is shorter than the equilibration time, the fast variable is recovered as <^>io- 
However, if the time scale of the burst is longer than the saturation time of the fast variable, 
the estimated covariance changes: the variance in the slow direction continues to grow, while 
the variance in the fast direction is fixed. This means that the apparent time scale separation 
is smaller, the collapse of the fast variable is less pronounced relative to the slow variable, 
and the fast variable is recovered in an earlier eigenvector (in our ordering of the spectrum). 
The right column of Figure 6 shows that, when the burst is now longer than the equilibration 
time, the fast variable appears earlier in the eigenvalue spectrum and is recovered as fe. 

6.2. Nonlinear observation function. In the second example, our data will be warped 
into “half-moon” shapes via the function 


yi(t) 
. 2/2 (t). 

= f ( x W) = 

x\ (t) + x|(i) 
X2 (t) 

g(y(f))= f 1 (y(t)) = 

yi(t) - vl{t) 
2/2 (t) 


Figure 7 shows the data from Figure 3 transformed by the function f in (6.5) and colored 
by time. It is important to note that this is a difficult class of problem in practice, as none 
of the observed variables are purely fast or slow, and the observed system appears, at first 
inspection, to possess no separation of time scales. For this example, the analytical covariance 
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and inverse covariance are 

1 e + 4 x 2 (t) 2x2 (t) 
e 2 x 2 (t) 1 

1 -2 x 2 (t) 

—2 x 2 (t) e + 4 xl(t) 

The fast and slow variables are now coupled through the function f, and the Euclidean 
distance is not informative about the fast or the slow variables. We need to use the Maha- 
lanobis distance to obtain a parametrization that is consistent with the underlying fast-slow 
dynamics. 

6.2.1. Errors in Mahalanobis distance. We can bound the Mahalanobis distance by the 
eigenvalues of Ct, 

(6.7) A ct ,il|y(* 2 ) - y(ti)||I < llyte) - y(ti)\\ 2 M < A c t, 2 ||y(^) - y(ti)||l 

where A^t i < Agy 2 are the two eigenvalues of Cl. Therefore, for the example in (6.5), we 
have 


( 6 . 6 ) 


C(x(f)) 

C f (x(f)) 


(6-8) E M (y(ti),y(t 2 )) = -(y 2 (t 2 ) - 2 / 2 (H)) 4 . 


Figure 8(a) shows \\y{t 2 ) - y{h)\\ 2 M and \Em\ as a function of ||y(i 2 ) — y(ti)|| 2 . The Ma¬ 
halanobis distance is an accurate approximation to the true intrinsic distance ||z(t 2 ) — z (^i )||2 
when \E M \ < ||y(*2) — y(*i)|li^ (the shaded yellow region in the plot indicates where \Em\ < 
l|y(* 2 ) - y(ti)llM)- We want to choose o-l ernel in a regime where \EM(y(ti),y(t 2 ))\ <C 
lly(^) - y(*i)|lM> so that the distances we utilize in the diffusion maps calculation are ac¬ 
curate. We can find such a regime empirically by plotting ||y(f 2 ) — y(^i)|lM as a function of 
||y(t 2 ) — y(ii)|| 2 , and assessing when the relationship deviates from quadratic. This is shown 
in Figure 8(b), and the deviation from quadratic behavior is consistent with the intersection 
of the analytical expressions plotted in Figure 8(a). 

Figures 9(a) and (b) show the data from Figure 7, colored by <f>i for two different values of 
a kernel- The corresponding values of (j\ erne i are indicated by the dashed lines. When & 2 erne i 
corresponds to a region where \Em\ ||y(^ 2 ) ~ y(ti)|||f, 4>i is well correlated with the slow 
variable. However, when cr^ ernel corresponds to a region where \Em\ ^ ||y(* 2 ) — y(^i)||the 
slow variable is no longer recovered. 


6.2.2. Errors in covariance estimation. From (5.8), we find that, for the example in (6.5), 


(6.9) 


2 St 8 x 2 (t) 


E 


7^0,12 (x(t), =E c ,2i{x(t),5t) 

x 2 (t)8t 2 

e 2 e 2 St 

= - S 4 + 0(M 3/2 ) 


/>t-\-St / />t-\-5t />S2 \ 

J [j 2 x 2 (si)ds\ + J x 2 {si)ds\ J ds 2 


+ 0 ( 6 t 3/2 ) 


E 


rt+St / />t+8t PS 2 \ 

J IJ 2 x 2 (si)dsi+ J x 2 (si)ds\ J ds 2 


+ (D( 6 t 3/2 ) 
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The error in the covariance is O (^). As expected, the error grows with increasing 5t. We can 
also see the explicit dependence of the covariance error on the time scale separation e; larger 
time scale separation results in a larger covariance error, as a more refined simulation burst 
is required to estimate the covariance of the fast directions. ||C|| and ||Ec|| are plotted as a 
function of 5t in Figure 8(c); the shaded yellow portion denotes the region where ||Ecj| < ||C||. 
As in the previous example, we can empirically find where ||Ec|| <C ||C|| by plotting ||C|| as 
a function of 5t and looking for a knee in the plot. These results are shown in Figure 8(d). 

Figures 9(a) and (c) show the data from Figure 7, colored by <f>\ for two different values 
of 5t. The corresponding values of 5t are indicated by the dashed lines in Figure 8(c) and (d). 
When 5t corresponds to a region where ||Ec|| <C ||C||, the slow variable is recovered by the first 
diffusion maps coordinate. However, when 5t corresponds to a region where ||Ec|| 3> ||C||, 
the slow variable is no longer recovered. 

7. Conclusions. We have presented a methodology to compute a parametrization of a 
data set which respects the slow variables in the underlying dynamical system. The approach 
utilizes diffusion maps, a kernel-based manifold learning technique, with the Mahalanobis 
distance as the metric. We showed that the Mahalanobis distance collapses the fast directions 
within a data set, allowing for successful recovery of the slow variables. Furthermore, we 
showed how to estimate the covariances (required for the Mahalanobis distance) directly from 
data. A key point in our approach is that the embedding coordinates we compute are not 
only insensitive to the fast variables, but are also invariant to nonlinear observation functions. 
Therefore, the approach can be used for data fusion: data collected from the same system via 
different measurement functions can be combined and merged into a single coordinate system. 

In the examples presented, the initial data came from a single trajectory of a dynamical 
system, and the local covariance at each point in the trajectory was estimated using brief 
simulation bursts. However, the initial data need not be collected from a single trajectory, 
and other sampling schemes could be employed. Brief time series are required to estimate the 
local covariances, but given a simulator, one could reinitialize brief simulation bursts which 
are sufficiently short and refined from each sample point. 

In our examples, we controlled the time scale of sampling and could therefore set the time 
scale over which to estimate the covariance and the simulation time step arbitrarily small. 
However, in some settings, such as previously collected historical data, it is not uncommon to 
have a fixed sampling rate and be unable to reinitialize simulations. In such cases, it is possible 
that we cannot find an appropriate kernel scale given the fixed 5t such that we can accurately 
recover the slow variables. For these cases, the data cannot be processed as given, and it 
is necessary to construct intermediate observers, such as histograms, Fourier coefficients, or 
scattering transform coefficients [27, 38, 39]. Such intermediates are more complex statistical 
functions than simple averages and can capture additional structure within the data. They 
also reduce the effects of noise and permit a larger time step. However, constructing such 
intermediates often requires additional a priori knowledge about the system dynamics and 
noise structure. 

Clearly, in our analysis, we have ignored the finite sampling effects in our estimation. In 
reality, both the number of samples used to estimate the covariances, as well as the density of 
sampled points on the manifold, affect the recovered parametrization and provide additional 



16 


C. J. DSILVA ET AL 


constraints on 5t and crkernel- Future work involves extending our analysis to the finite sample 
case, and providing guidelines for the amount of data required to apply our methodology. 

The methods presented here provide a bridge between traditional data mining and multiple 
time scale dynamical systems. With this interface established, one can now consider using 
such data-driven methodologies to extract reduced models (either explicitly, or implicitly via 
an equation-free framework [14, 21, 22, 23]) which also respect the underlying slow dynamics 
and geometry of the data. Such reduced models hold the promise of accelerated analysis and 
reduced simulation of dynamical systems whose effective dynamics are obscure upon simple 
inspection. 
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Figure 6: Relationship between changing 6 t and recovery of the variables. From left to right, 
the columns correspond to 6 t = 10~ 6 ,10~ 5 , ICR 3 . (Row 1) Data (gray) and representative 
burst (red) used to estimate the local covariance. (Row 2) Correlation between first diffusion 
maps coordinate and the slow variable x\. (Row 3) Correlation between the relevant diffusion 
maps coordinate and the fast variable X 2 ■ Note that for 5t = 10~ 6 and 5t = 10“ 5 , X 2 is 
correlated with 0io- When 5t = 10 -3 , X 2 is correlated with (f> q. (Row 4) Diffusion maps 
eigenvalue spectra. The eigenvalues corresponding to the coordinates for the slow and fast 
modes are indicated by red circles. Note that when 5t is too large, the apparent time scale 
separation decreases and the coordinate corresponding to the fast variable appears earlier in 
the spectrum. 
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Figure 7: The data from Figure 3, transformed by f in (6.5) 
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(c) (d) 

Figure 8: Errors in the Mahalanobis distance and the covariance estimation for the nonlinear 
example in (6.5). (a) The analytical expressions for the contributions to the distance ap¬ 
proximation as a function of ||y 2 — yiH 2 - (b) The average estimated Mahalanobis distance 
||y 2 — yi || 2 m as a function of the distance ||y 2 - yiH 2 - The line ||y 2 - yi\\ 2 M = ||y 2 - yilb is 
shown for reference. The yellow region indicates the range in which a kernel should be chosen, 
(c) The analytical expressions for the contributions to the covariance as a function of 5t. (d) 
The average estimated covariance ||C|| as a function of 5t. The yellow region indicates the 
range of 5t over which the errors in the estimated covariance are small relative to the norm 
of the covariance. 
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Figure 9: Data from Figure 7, colored by the first diffusion maps variable <j)± using the 
Mahalanobis distance for three different parameter settings. The relevant values of St and 
cr kernel are indicated by the red dashed lines on the corresponding plots. (Row 1) St = 10”' 
and (^kernel = 10” 2 . Note that the parametrization is one-to-one with the slow variable. 
(Row 2) St = 10”' and &\ erne i = 10 1 . We do not recover the slow variable because (Jkemei is 
too large. (Row 3) St = 10” 3 and cr| errieZ = 10” 2 . We do not recover the slow variable because 
St is too large. 




























